354 research outputs found

    Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results

    Full text link
    Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single‐study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep‐coverage (~82×) exome and low‐coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts.For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta‐analysis has similar power to joint analysis in deep‐coverage sequence data but can be less powerful in low‐coverage sequence data. Given similar data processing and quality control steps, we recommend single‐study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep‐coverage data.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/1/gepi22261_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/2/gepi22261.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/3/gepi22261-sup-0002-final_revised_supp_figures_7_19_2019.pd

    Improving power for rare‐variant tests by integrating external controls

    Full text link
    Due to the drop in sequencing cost, the number of sequenced genomes is increasing rapidly. To improve power of rare‐variant tests, these sequenced samples could be used as external control samples in addition to control samples from the study itself. However, when using external controls, possible batch effects due to the use of different sequencing platforms or genotype calling pipelines can dramatically increase type I error rates. To address this, we propose novel summary statistics based single and gene‐ or region‐based rare‐variant tests that allow the integration of external controls while controlling for type I error. Our approach is based on the insight that batch effects on a given variant can be assessed by comparing odds ratio estimates using internal controls only vs. using combined control samples of internal and external controls. From simulation experiments and the analysis of data from age‐related macular degeneration and type 2 diabetes studies, we demonstrate that our method can substantially improve power while controlling for type I error rate.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/138932/1/gepi22057.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/138932/2/gepi22057_am.pd

    Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection

    Get PDF
    Mutual information (MI) is a robust nonparametric statistical approach for identifying associations between genotypes and gene expression levels. Using the data of Problem 1 provided for the Genetic Analysis Workshop 15, we first compared a quantitative MI (Tsalenko et al. 2006 J Bioinform Comput Biol 4:259–4) with the standard analysis of variance (ANOVA) and the nonparametric Kruskal-Wallis (KW) test. We then proposed a novel feature selection approach using MI in a classification scenario to address the small n - large p problem and compared it with a feature selection that relies on an asymptotic χ2 distribution. In both applications, we used a permutation-based approach for evaluating the significance of MI. Substantial discrepancies in significance were observed between MI, ANOVA, and KW that can be explained by different empirical distributions of the data. In contrast to ANOVA and KW, MI detects shifts in location when the data are non-normally distributed, skewed, or contaminated with outliers. ANOVA but not MI is often significant if one genotype with a small frequency had a remarkable difference in the average gene expression level relative to the other two genotypes. MI depends on genotype frequencies and cannot detect these differences. In the classification scenario, we show that our novel approach for feature selection identifies a smaller list of markers with higher accuracy compared to the standard method. In conclusion, permutation-based MI approaches provide reliable and flexible statistical frameworks which seem to be well suited for data that are non-normal, skewed, or have an otherwise peculiar distribution. They merit further methodological investigation

    Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls

    Get PDF
    Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10−3) and candidate genes from knockout mice (P = 5.2 × 10−3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000–185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts

    Evaluation of the role of STAP1 in Familial Hypercholesterolemia

    Get PDF
    Familial hypercholesterolemia (FH) is characterised by elevated serum levels of low-density lipoprotein cholesterol (LDL-C) and a substantial risk for cardiovascular disease. The autosomal-dominant FH is mostly caused by mutations in LDLR (low density lipoprotein receptor), APOB (apolipoprotein B), and PCSK9 (proprotein convertase subtilisin/kexin). Recently, STAP1 has been suggested as a fourth causative gene. We analyzed STAP1 in 75 hypercholesterolemic patients from Berlin, Germany, who are negative for mutations in canonical FH genes. In 10 patients with negative family history, we additionally screened for disease causing variants in LDLRAP1 (low density lipoprotein receptor adaptor protein 1), associated with autosomal-recessive hypercholesterolemia. We identified one STAP1 variant predicted to be disease causing. To evaluate association of serum lipid levels and STAP1 carrier status, we analyzed 20 individuals from a population based cohort, the Cooperative Health Research in South Tyrol (CHRIS) study, carrying rare STAP1 variants. Out of the same cohort we randomly selected 100 non-carriers as control. In the Berlin FH cohort STAP1 variants were rare. In the CHRIS cohort, we obtained no statistically significant differences between carriers and non-carriers of STAP1 variants with respect to lipid traits. Until such an association has been verified in more individuals with genetic variants in STAP1, we cannot estimate whether STAP1 generally is a causative gene for FH

    A Remote Secondary Binding Pocket Promotes Heteromultivalent Targeting of DC-SIGN

    Get PDF
    Dendritic cells (DC) are antigen-presenting cells coordinating the interplay of the innate and the adaptive immune response. The endocytic C-type lectin receptors DC-SIGN and Langerin display expression profiles restricted to distinct DC subtypes and have emerged as prime targets for next-generation immunotherapies and anti-infectives. Using heteromultivalent liposomes copresenting mannosides bearing aromatic aglycones with natural glycan ligands, we serendipitously discovered striking cooperativity effects for DC-SIGN+ but not for Langerin+ cell lines. Mechanistic investigations combining NMR spectroscopy with molecular docking and molecular dynamics simulations led to the identification of a secondary binding pocket for the glycomimetics. This pocket, located remotely of DC-SIGN’s carbohydrate bindings site, can be leveraged by heteromultivalent avidity enhancement. We further present preliminary evidence that the aglycone allosterically activates glycan recognition and thereby contributes to DC-SIGN-specific cell targeting. Our findings have important implications for both translational and basic glycoscience, showcasing heteromultivalent targeting of DCs to improve specificity and supporting potential allosteric regulation of DC-SIGN and CLRs in general

    The genetic architecture of type 2 diabetes

    Get PDF
    The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes

    Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

    Get PDF
    To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (\u3e80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D
    corecore